Revisiting Direct Tag Search Algorithm on Superscalar Processors
نویسندگان
چکیده
Modern microprocessors schedule instructions dynamically in order to exploit instruction level parallelism. It is necessary to increase instruction window size for improving instruction scheduling capability. However, it is di cult to increase the size without any serious impact on processor performance, since the instruction window is one of the dominant deciding processor cycle time. The reason why the instruction window is critical is that it is realized using content addressable memory (CAM). In general, RAMs are faster in access time and lower in power dissipation than CAMs. Therefore, it is desirable to replace the CAM instruction window by the RAM instruction window. This paper proposes such an instruction window, named explicit data forwarding instruction window. Simulation results show that the proposed instruction window achieves comparable performance with the conventional instruction window, while it could bene t from a shorter cycle time.
منابع مشابه
Optimizing Matrix-matrix Multiplication for an Embedded Vliw Processor
The optimization of matrix-matrix multiplication (MMM) performance has been well studied on conventional general-purpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. However, the microarchitectur...
متن کاملA Direct - Execution Frameworkfor Fast and Accurate Simulation of Superscalar Processors
Multiprocessor system evaluation has traditionally been based on direct-execution based Execution-Driven Simulations (EDS). In such environments, the processor component of the system is not fully modeled. With wide-issue superscalar processors being the norm in today's multiprocessor nodes, there is an urgent need for mod-eling the processor accurately. However, using direct-execution to model...
متن کاملNew Formulation and Solution in PCB Assembly Systems with Parallel Batch processors
This paper considers the scheduling problem of parallel batch processing machines with non-identical job size and processing time. In this paper, a new mathematical model with ready time and batch size constraints is presented to formulate the problem mathematically, in which simultaneous reduction of the makespan and earliness-tardiness is the objective function. In recent years, the nature-in...
متن کاملTradeo s in Processor/Memory Interfaces for Superscalar Processors
The current scheme of dealing with data cache misses is not well-suited for superscalar processors. In this scheme, the processor is blocked by holding its clock low until the missing cache block can be fetched from memory and inserted into the cache. From the processor's viewpoint, the miss did not occur. From the user's viewpoint, the execution time was lengthened in direct proportion to the ...
متن کاملA Time Stamping Algorithm for Computing the Critical Path of Program Execution on Superscalar Processors
The increasing complexity of modern superscalar processors makes the evaluation of new designs more difficult. Current simulators such as Stanford’s SimOS [16] and the University of Wisconsin’s Simplescalar Toolset [2] perform detailed cycle-level simulation of the processor to obtain performance measurements at the cost of very slow simulation times. This report presents and analyzes an algori...
متن کامل